This paper studies the quantization of heavy-tailed data in some fundamental statistical estimation problems, where the underlying distributions have bounded moments of some order. We propose to truncate and properly dither the data prior to a uniform quantization. Our major standpoint is that (near) minimax rates of estimation error are achievable merely from the quantized data produced by the proposed scheme. In particular, concrete results are worked out for covariance estimation, compressed sensing, and matrix completion, all agreeing that the quantization only slightly worsens the multiplicative factor. Besides, we study compressed sensing where both covariate (i.e., sensing vector) and response are quantized. Under covariate quantization, although our recovery program is non-convex because the covariance matrix estimator lacks positive semi-definiteness, all local minimizers are proved to enjoy near optimal error bound. Moreover, by the concentration inequality of product process and covering argument, we establish near minimax uniform recovery guarantee for quantized compressed sensing with heavy-tailed noise.
translated by 谷歌翻译
Implicit regularization is an important way to interpret neural networks. Recent theory starts to explain implicit regularization with the model of deep matrix factorization (DMF) and analyze the trajectory of discrete gradient dynamics in the optimization process. These discrete gradient dynamics are relatively small but not infinitesimal, thus fitting well with the practical implementation of neural networks. Currently, discrete gradient dynamics analysis has been successfully applied to shallow networks but encounters the difficulty of complex computation for deep networks. In this work, we introduce another discrete gradient dynamics approach to explain implicit regularization, i.e. landscape analysis. It mainly focuses on gradient regions, such as saddle points and local minima. We theoretically establish the connection between saddle point escaping (SPE) stages and the matrix rank in DMF. We prove that, for a rank-R matrix reconstruction, DMF will converge to a second-order critical point after R stages of SPE. This conclusion is further experimentally verified on a low-rank matrix reconstruction problem. This work provides a new theory to analyze implicit regularization in deep learning.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Human modeling and relighting are two fundamental problems in computer vision and graphics, where high-quality datasets can largely facilitate related research. However, most existing human datasets only provide multi-view human images captured under the same illumination. Although valuable for modeling tasks, they are not readily used in relighting problems. To promote research in both fields, in this paper, we present UltraStage, a new 3D human dataset that contains more than 2K high-quality human assets captured under both multi-view and multi-illumination settings. Specifically, for each example, we provide 32 surrounding views illuminated with one white light and two gradient illuminations. In addition to regular multi-view images, gradient illuminations help recover detailed surface normal and spatially-varying material maps, enabling various relighting applications. Inspired by recent advances in neural representation, we further interpret each example into a neural human asset which allows novel view synthesis under arbitrary lighting conditions. We show our neural human assets can achieve extremely high capture performance and are capable of representing fine details such as facial wrinkles and cloth folds. We also validate UltraStage in single image relighting tasks, training neural networks with virtual relighted data from neural assets and demonstrating realistic rendering improvements over prior arts. UltraStage will be publicly available to the community to stimulate significant future developments in various human modeling and rendering tasks.
translated by 谷歌翻译
We present a neural flow wavefunction, Gauge-Fermion FlowNet, and use it to simulate 2+1D lattice compact quantum electrodynamics with finite density dynamical fermions. The gauge field is represented by a neural network which parameterizes a discretized flow-based transformation of the amplitude while the fermionic sign structure is represented by a neural net backflow. This approach directly represents the $U(1)$ degree of freedom without any truncation, obeys Guass's law by construction, samples autoregressively avoiding any equilibration time, and variationally simulates Gauge-Fermion systems with sign problems accurately. In this model, we investigate confinement and string breaking phenomena in different fermion density and hopping regimes. We study the phase transition from the charge crystal phase to the vacuum phase at zero density, and observe the phase seperation and the net charge penetration blocking effect under magnetic interaction at finite density. In addition, we investigate a magnetic phase transition due to the competition effect between the kinetic energy of fermions and the magnetic energy of the gauge field. With our method, we further note potential differences on the order of the phase transitions between a continuous $U(1)$ system and one with finite truncation. Our state-of-the-art neural network approach opens up new possibilities to study different gauge theories coupled to dynamical matter in higher dimensions.
translated by 谷歌翻译
Along with the widespread use of face recognition systems, their vulnerability has become highlighted. While existing face anti-spoofing methods can be generalized between attack types, generic solutions are still challenging due to the diversity of spoof characteristics. Recently, the spoof trace disentanglement framework has shown great potential for coping with both seen and unseen spoof scenarios, but the performance is largely restricted by the single-modal input. This paper focuses on this issue and presents a multi-modal disentanglement model which targetedly learns polysemantic spoof traces for more accurate and robust generic attack detection. In particular, based on the adversarial learning mechanism, a two-stream disentangling network is designed to estimate spoof patterns from the RGB and depth inputs, respectively. In this case, it captures complementary spoofing clues inhering in different attacks. Furthermore, a fusion module is exploited, which recalibrates both representations at multiple stages to promote the disentanglement in each individual modality. It then performs cross-modality aggregation to deliver a more comprehensive spoof trace representation for prediction. Extensive evaluations are conducted on multiple benchmarks, demonstrating that learning polysemantic spoof traces favorably contributes to anti-spoofing with more perceptible and interpretable results.
translated by 谷歌翻译
In the scenario of unsupervised extractive summarization, learning high-quality sentence representations is essential to select salient sentences from the input document. Previous studies focus more on employing statistical approaches or pre-trained language models (PLMs) to extract sentence embeddings, while ignoring the rich information inherent in the heterogeneous types of interaction between words and sentences. In this paper, we are the first to propose an unsupervised extractive summarizaiton method with heterogeneous graph embeddings (HGEs) for Chinese document. A heterogeneous text graph is constructed to capture different granularities of interactions by incorporating graph structural information. Moreover, our proposed graph is general and flexible where additional nodes such as keywords can be easily integrated. Experimental results demonstrate that our method consistently outperforms the strong baseline in three summarization datasets.
translated by 谷歌翻译
Since the recent success of Vision Transformers (ViTs), explorations toward transformer-style architectures have triggered the resurgence of modern ConvNets. In this work, we explore the representation ability of DNNs through the lens of interaction complexities. We empirically show that interaction complexity is an overlooked but essential indicator for visual recognition. Accordingly, a new family of efficient ConvNets, named MogaNet, is presented to pursue informative context mining in pure ConvNet-based models, with preferable complexity-performance trade-offs. In MogaNet, interactions across multiple complexities are facilitated and contextualized by leveraging two specially designed aggregation blocks in both spatial and channel interaction spaces. Extensive studies are conducted on ImageNet classification, COCO object detection, and ADE20K semantic segmentation tasks. The results demonstrate that our MogaNet establishes new state-of-the-art over other popular methods in mainstream scenarios and all model scales. Typically, the lightweight MogaNet-T achieves 80.0\% top-1 accuracy with only 1.44G FLOPs using a refined training setup on ImageNet-1K, surpassing ParC-Net-S by 1.4\% accuracy but saving 59\% (2.04G) FLOPs.
translated by 谷歌翻译
我们提出了一个基于神经网络的系统,用于长期,多动能人类运动合成。该系统被称为神经木偶,可以从简单的用户输入中平稳过渡,包括带有预期动作持续时间的动作标签,以及如果用户指定的话,则可以产生高质量和有意义的动作。我们系统的核心是一种基于变压器的新型运动生成模型,即Marionet,它可以在给定的动作标签给定不同的动作。与现有运动生成模型不同,Marionet利用了过去的运动剪辑和未来动作标签的上下文信息,专门用于生成可以平稳融合历史和未来动作的动作。具体而言,Marionet首先将目标动作标签和上下文信息编码为动作级潜在代码。该代码通过时间展开模块将代码展开为帧级控制信号,然后可以将其与其他帧级控制信号(如目标轨迹)结合使用。然后以自动回归方式生成运动帧。通过依次应用木偶,系统神经木偶可以借助两个简单的方案(即“影子开始”和“动作修订”)来稳健地产生长期的多动作运动。与新型系统一起,我们还提供了一个专门针对多动运动综合任务的新数据集,其中包含动作标签及其上下文信息。进行了广泛的实验,以研究我们系统产生的动作的动作准确性,自然主义和过渡平滑度。
translated by 谷歌翻译
人搜索是多个子任务的集成任务,例如前景/背景分类,边界框回归和人员重新识别。因此,人搜索是一个典型的多任务学习问题,尤其是在以端到端方式解决时。最近,一些作品通过利用各种辅助信息,例如人关节关键点,身体部位位置,属性等,这带来了更多的任务并使人搜索模型更加复杂。每个任务的不一致的趋同率可能会损害模型优化。一个直接的解决方案是手动为不同的任务分配不同的权重,以补偿各种融合率。但是,鉴于人搜索的特殊情况,即有大量任务,手动加权任务是不切实际的。为此,我们提出了一种分组的自适应减肥方法(GALW)方法,该方法会自动和动态地调整每个任务的权重。具体而言,我们根据其收敛率对任务进行分组。同一组中的任务共享相同的可学习权重,这是通过考虑损失不确定性动态分配的。对两个典型基准(Cuhk-Sysu and Prw)的实验结果证明了我们方法的有效性。
translated by 谷歌翻译